home *** CD-ROM | disk | FTP | other *** search
- Subject: v16i070: Spiff, find approximate differences in files, Part04/04
- Newsgroups: comp.sources.unix
- Sender: sources
- Approved: rsalz@uunet.UU.NET
-
- Submitted-by: Daniel W Nachbar <daniel@wind.bellcore.com>
- Posting-number: Volume 16, Issue 70
- Archive-name: spiff/part04
-
- #! /bin/sh
- # This is a shell archive. Remove anything before this line, then unpack
- # it by saving it into a file and typing "sh file". To overwrite existing
- # files, type "sh file -c". You can also feed this as standard input via
- # unshar, or by typing "sh <file", e.g.. If this archive is complete, you
- # will see the following message at the end:
- # "End of archive 4 (of 4)."
- # Contents: paper.ms
- # Wrapped by rsalz@papaya.bbn.com on Fri Nov 11 13:12:27 1988
- PATH=/bin:/usr/bin:/usr/ucb ; export PATH
- if test -f 'paper.ms' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'paper.ms'\"
- else
- echo shar: Extracting \"'paper.ms'\" \(25154 characters\)
- sed "s/^X//" >'paper.ms' <<'END_OF_FILE'
- X.ll 6i
- X.nr PO 1.15i
- X.nr HM 0i
- X.nr FM 1.05i
- X.TL
- XSPIFF -- A Program for Making Controlled Approximate Comparisons of Files
- X.AU
- XDaniel Nachbar
- X.AI
- XSoftware Engineering Research Group
- XBell Communications Research
- XMorristown, New Jersey
- X.AB
- XThe well known program
- X.B
- Xdiff
- X.R
- X[1]
- Xis inappropriate for some
- Xcommon tasks such as comparing the output of floating
- Xpoint calculations where roundoff errors
- Xlead
- X.B
- Xdiff
- X.R
- Xastray and comparing program source code
- Xwhere some differences in the text (such as white space and comments)
- Xhave no effect on the operation of the compiled code. A new program,
- Xnamed
- X.B
- Xspiff,
- X.R
- Xaddresses these and other similar cases
- Xby lexical parsing of the input files and then applying
- Xa differencing algorithm to the token sequences.
- X.B
- XSpiff
- X.R
- Xignores differences
- Xbetween floating point numbers that are below a user settable tolerance.
- XOther features include user settable commenting and literal string
- Xconventions and a choice of differencing algorithm.
- XThere is also an interactive mode wherein the input texts are displayed
- Xwith differences highlighted. The user can change numeric tolerances
- X"on the fly" and
- X.B
- Xspiff
- X.R
- Xwill adjust the highlighting accordingly.
- X.AE
- X.SH
- XSome Troubles With Diff
- X.PP
- XOver the past several years, it has been fairly easy to tell when
- Xa new type of computer arrived at a nearby computer center.
- XThe best clue was the discordant chorus of
- Xgroaning, sighing, gnashing of teeth, pounding of foreheads on desks,
- Xand other sounds of distress. Tracing these noises to their source, one
- Xwould find some poor soul in the process of installing
- Xa numerical analysis package on the new machine.
- X.PP
- XOne might expect that "moving up" to a new machine
- Xwould be a cause for celebration.
- XAfter all, new machines are typically bigger, faster,
- Xand better than old machines.
- XHowever, the floating point arithmetic on any new machine is frequently
- Xslightly different from any old machine.
- XAs a consequence,
- Xsoftware package test routines produce output that is slightly different,
- Xbut still correct, on the new machines.
- XSerious troubles appear when the person installing the software package
- Xattempts to compare the test output files from two different machines
- Xby using a difference finding program such as
- X.B
- Xdiff.
- X.R
- XPrograms such as
- X.B
- Xdiff
- X.R
- Xdo a character by character comparison.
- X.B
- XDiff
- X.R
- Xfinds a great many differences, most of which
- Xare due to roundoff errors in the least significant digits of floating point
- Xnumbers. Others are the result of differences in the way
- Xin which the two test runs
- Xhad printed a number (3.4e-1 vs. 0.34).
- XIn one case, the test suite for the S statistical analysis package[2],
- Xover 1700 floating point numbers are produced
- X(per machine). In the eyes of
- X.B
- Xdiff,
- X.R
- Xroughly 1200 of these numbers are different.
- XHowever, none of the "differences" are important ones.
- XNonetheless, software installers wind up inspecting the output by eye.
- X.PP
- XA similar problem arises when one attempts to
- Xlook for differences between two versions
- Xof the same C program.
- X.B
- XDiff
- X.R
- Xreports many differences that are not of interest. In
- Xparticular, white space (except inside quotation marks) and
- Xanything inside a comment have no effect on the operation of the compiled
- Xprogram and are usually not of interest.
- X.B
- XDiff
- X.R
- Xdoes have a mode of operation where white space
- Xwithin a line (spaces and tabs) can be ignored.
- XHowever, differences in the placement of newlines cannot be ignored.
- XThis is particularly annoying since C programming
- Xstyles differ on whether to place a newline character before or after the '{'
- Xcharacters that start blocks.
- X.SH
- XThe Problem in General Terms
- X.PP
- XAs already mentioned, programs such as
- X.B
- Xdiff
- X.R
- Xdo
- Xa character-by-character comparison of the input files.
- XHowever, when it comes to interpreting
- Xthe contents of a file (either by a human or by a program)
- Xit is almost never the case that characters
- Xare treated individually. Rather, characters make up tokens such
- Xas words and numbers, or act as separators between these tokens.
- XWhen comparing files, one is usually looking for
- Xdifferences between these tokens, not the characters that make them up
- Xor the characters that separate them.
- X.PP
- XWhat is needed is a program that first parses the input files
- Xinto tokens, and then applies a differencing algorithm to the token
- Xsequences.
- XIn addition to finding differences in terms of tokens,
- Xit is possible to interpret the tokens and
- Xcompare different types of tokens in different ways. Numbers, for example,
- Xcan differ by a lot or a little.\**
- X.FS
- XCurrent differencing programs do not have such a notion because
- Xthe difference between two characters is a binary function.
- XTwo characters are the same or they are not.
- X.FE
- XIt is possible to use a tolerance when comparing two number tokens and
- Xreport only those differences that exceed the tolerance.
- X.SH
- XDesign Issues
- X.PP
- XA serious design issue for such a program is how
- Xcomplex to make the parse. The
- X.I
- Xdeeper
- X.R
- Xone goes in the parsing the larger
- Xthe unit of text that can be manipulated. For instance, if one is looking
- Xfor differences in C code, a complete parse tree can be produced and
- Xthe differencing algorithm could examine insertion and deletion of entire
- Xbranches of the tree. However, deep parsing requires much more
- Xcomplex parsing and slower differencing algorithms.
- X.PP
- XAnother design issue is deciding how to interpret the tokens.
- XCloser interpretation may lead to greater flexibility in comparing tokens, but
- Xalso results in a more cumbersome and error-prone implementation.
- X.PP
- XIn the program described here, we attempt to keep both the depth
- Xof the parse and the semantics of the tokens to a minimum.
- XThe parse is a simple
- Xlexical parse with the input files broken up into one dimensional
- Xsequences of numbers, literal strings and white space.
- XLiteral strings and white space are not interpreted. Numbers
- Xare treated as representing points on the real number line.
- X.SH
- XDefault Operation
- X.PP
- X.B
- XSpiff\**
- X.R
- X.FS
- XWe picked the name as a way to pay a small tribute to that famous intergalactic
- Xadventurer Spaceman Spiff[3].
- X.B
- XSpiff
- X.R
- Xis also a contraction of "spiffy diff".
- X.FE
- Xworks very much like
- X.B
- Xdiff.
- X.R
- XIt reads two files, looks
- Xfor differences, and prints a listing of the
- Xdifferences in the form of
- Xan edit script.\**
- X.FS
- XAn edit script is a sequence of insertions and deletions
- Xthat will transform the first file into the second.
- X.FE
- XAs already suggested,
- X.B
- Xspiff
- X.R
- Xparses the files into
- Xliteral strings and real numbers.
- XThe definition of these tokens can be altered somewhat by the user
- X(more on this later). For now, suffice it
- Xto say that literals are strings like "cow", "sit",
- X"into", etc. Real numbers look like "1.3", "1.6e-4" and so on.
- XAll of the common formats for real numbers are recognized.
- XThe only requirements for a string to be
- Xtreated as a real number is the presence
- Xof a period and at least one digit.
- XBy default, a string of digits without a decimal point
- X(such as "1988") is not considered to be a real number,
- Xbut rather a literal string.\**
- XEach non-alphanumeric character (such as #$@^&*)
- Xis parsed into a separate literal token.
- X.FS
- XInteger numbers are often used as indices, labels, and so on.
- XUnder these circumstances, it is more appropriate to treat them as literals.
- XOur choice of default was driven by a design goal
- Xof having
- X.B
- Xspiff
- X.R
- Xbe very conservative
- Xwhen choosing to ignore differences.
- X.FE
- X.PP
- XOnce
- X.B
- Xspiff
- X.R
- Xdetermines the two sequences of tokens,
- Xit compares members of the first sequence with
- Xmembers of the second sequence.
- XIf two tokens are of different types,
- X.B
- Xspiff
- X.R
- Xdeems them to be different, regardless of their content.
- XIf both tokens are literal tokens,
- X.B
- Xspiff
- X.R
- Xwill deem them
- Xto be different if any of their characters differ.
- XWhen comparing two real numbers,
- X.B
- Xspiff
- X.R
- Xwill deem them to be different only if
- Xthe difference in their values exceeds a user settable tolerance.
- X.SH
- XAltering Spiff's Operation
- X.PP
- XTo make
- X.B
- Xspiff
- X.R
- Xmore generally useful, the user can control:
- X.IP \(bu
- Xhow text strings are parsed into tokens
- X.IP \(bu
- Xhow tokens of the same type are compared
- X.IP \(bu
- Xthe choice of differencing algorithm used
- X.IP \(bu
- Xand the granularity of edit considered by the differencing algorithm.
- X.LP
- X.PP
- XThese features are described next.
- X.SH
- XAltering the Parse
- X.PP
- XThe operation of the parser can be altered in several ways.
- XThe user can specify that delimited sections of text are to be ignored
- Xcompletely. This is useful for selectively ignoring the contents of
- Xcomments in programs. Similarly, the user can specify that
- Xdelimited sections of text (including white space)
- Xbe treated as a single literal token. So, literal strings in program
- Xtext can be treated appropriately.
- XMultiple sets of
- Xdelimiters may be specified at once (to handle cases such as the
- XModula-2 programming language
- Xwhere there are two ways to specify quoted strings). At present,
- Xthe delimiters must be fixed string (possibly restricted to the
- Xbeginning of the line) or end of line.
- XAs a consequence of the mechanism for specifying literal strings,
- Xmulticharacter operators (such as the += operator in C)
- Xcan be parsed into a single token.
- X.PP
- XAs yet, no provision is made for allowing delimiter
- Xspecification in terms of regular expressions. This omission
- Xwas made for the sake of simplifying the parser.
- XNothing prevents the addition of regular expressions in the
- Xfuture. However, the simple mechanism
- Xalready in place handles the literal string and commenting conventions
- Xfor most well known programming languages.\**
- X.FS
- XSee the manual page in the appendix for examples of handling
- XC, Bourne Shell, Fortran, Lisp, Pascal, and Modula-2. The only
- Xcases that are known not to work are comments in BASIC and
- XHollerith strings in Fortran.
- X.FE
- X.PP
- XIn addition to controlling literal string and comments, the user
- Xmay also specify whether to treat white space characters as any other
- Xnon-alphanumeric character (in other words, parse each white space
- Xcharacter into its own literal token),
- Xwhether to parse sign markers as part
- Xof the number that they precede or as separate tokens, whether
- Xto treat numbers without printed decimal markers (e.g. "1988")
- Xas real numbers rather than as literal strings, and whether
- Xto parse real numbers into literal tokens.
- X.SH
- XAltering the Comparison of Individual Tokens
- X.PP
- XAs mentioned earlier, the user can set a tolerance below which differences
- Xbetween real numbers are ignored.
- X.B
- XSpiff
- X.R
- Xallows two kinds of tolerances:
- Xabsolute and relative.
- XSpecifying an absolute tolerance will cause
- X.B
- Xspiff
- X.R
- Xto ignore differences
- Xthat are less than the specified value.
- XFor instance, specifying an absolute tolerance of 0.01 will
- Xcause only those differences greater than or equal to 0.01 to be reported.
- XSpecifying a relative tolerance will cause
- X.B
- Xspiff
- X.R
- Xto ignore differences that are
- Xsmaller than some fraction of the number of larger magnitude.
- XSpecifically, the value of the tolerance is interpreted
- Xas a fraction of the larger (in absolute terms)
- Xof the two floating point numbers being compared.
- XFor example,
- Xspecifying a relative tolerance of 0.1
- Xwill cause the two floating point numbers 1.0 and 0.91 to be deemed within
- Xtolerance. The numbers 1.0 and 0.9 will be outside the tolerance.
- XAbsolute and relative tolerances can be OR'ed together. In fact,
- Xthe most effective way to ignore differences that are due to roundoff errors
- Xin floating point calculations is to use both
- Xa relative tolerance (to handle limits in precision) as well as an absolute
- Xtolerance (to handle cases when one number is zero and the other number is
- Xalmost zero).\**
- X.FS
- XAll numbers differ from zero by 100% of their magnitude. Thus, to handle
- Xnumbers that are near zero, one would have to specify a relative tolerance
- Xof 100% which would be unreasonably large when both numbers are non-zero.
- X.FE
- XIn addition, the user can specify an infinite tolerance. This is useful
- Xfor checking the format of output while ignoring the actual numbers
- Xproduced.
- X.SH
- XAltering the Differencing Algorithm
- X.PP
- XBy default,
- X.B
- Xspiff
- X.R
- Xproduces a minimal edit sequence (using the Miller/Myers differencing algorithm[4])
- Xthat will convert the first file into the second.
- XHowever, a minimal edit sequences is not always desirable.
- XFor example, for the following two tables of numbers:
- X.DS
- X0.1 0.2 0.3 0.2 0.3 0.4
- X0.4 0.5 0.6 0.5 0.6 0.7
- X.DE
- Xa minimal edit sequence to convert the table on
- Xthe left into the table on the right be to
- Xwould delete the first number (0.1) and insert 0.7 at the end.\**
- X.FS
- XThe problem of having the elements of tables become misaligned when
- Xthe differencing algorithm is trying
- Xto find a minimal number of edits can be reduced somewhat
- Xby retaining newlines and not using tolerances.
- XUnfortunately, it does not go away.
- X.FE
- XSuch a result, while logically correct, does not provide a good picture
- Xof the differences between the two files.
- XIn general, for text with a very definite structure (such as tables),
- Xwe may not want to consider insertions and deletions at all, but
- Xonly one-to-one changes.\**
- X.FS
- XA "change" can be expressed as one deletion and one insertion at the same
- Xpoint in the text.
- X.FE
- XSo, rather than look for a minimal edit script, we
- Xmerely want to compare each token in the first file with
- Xthe corresponding token in the second file.
- X.PP
- XThe user can choose which differencing algorithm to use
- X(the default Miller/Myers or
- Xthe alternative one-to-one comparison)
- Xbased upon what is known about the input files. In general,
- Xfiles produced mechanically
- X(such the output from test suites) have a very regular structure
- Xand the one-to-one comparison works surprisingly well.
- XFor files created by humans, the Miller/Myers
- Xalgorithm is more appropriate.
- XThere is nothing in
- X.B
- Xspiff's
- X.R
- Xinternal design that limits
- Xthe number of differencing algorithms that it can run.
- XOther differencing algorithms,
- Xin particular the one used in
- X.B
- Xdiff,
- X.R
- Xwill probably be added later.
- X.SH
- XAltering the Granularity of the Edit Sequence
- X.PP
- XBy default,
- X.B
- Xspiff
- X.R
- Xproduces an edit sequence
- Xin terms of insertions and deletions of individual tokens.
- XAt times it may be more useful to
- Xtreat the contents of the files as tokens when looking for differences
- Xbut
- Xexpress the edit script in terms of entire lines of the files rather
- Xthan individual tokens.\**
- X.FS
- XFor instance, if one wants to have
- X.B
- Xspiff
- X.R
- Xproduce output that can be fed into
- Xthe
- X.B
- Xed
- X.R
- Xeditor.
- X.FE
- X.B
- XSpiff
- X.R
- Xprovides a facility for restricting the edits to entire lines.
- X.SH
- XTreating Parts of the Files Differently
- X.PP
- XFor complex input files, it is important that different parts of the
- Xfile be treated in different ways. In other words, it may be impossible
- Xto find one set of parsing/differencing rules that work well for the
- Xentire file.
- X.B
- XSpiff
- X.R
- Xcan differentiate between parts of the input files on two bases:
- Xwithin a line and between lines.
- XWithin a line, a different tolerance can be applied to each real number.
- XThe tolerances are specified in terms of the ordinal position of the
- Xnumbers on the line (i.e. one tolerance is applied to the first real number
- Xon each line, a different tolerance is applied to the second number on
- Xeach line, a third tolerance is applied to the third, and so on). If more
- Xnumbers appear on a line than there are tolerances specified, the last
- Xtolerance is applied to all subsequent numbers on the line (i.e., if the user
- Xspecifies three tolerances, the third is applied to the third, fourth
- Xfifth, . . . number on each line). This feature is useful for applying
- Xdifferent tolerances to the different columns of a table of numbers.
- X.PP
- XBetween lines, the user can place "embedded commands" in the input files.
- XThese commands
- Xare instructions to parser that can change what tolerances are attached
- Xto real numbers and the commenting and literal string conventions used by the
- Xparser. Embedded commands are flagged to the parser
- Xby starting the line with a user-specified
- Xescape string. By combining within line and between line differentiation,
- Xit is possible for the user to specify a different tolerance
- Xfor every single real number in the input files.
- X.SH
- XVisual Mode
- X.PP
- XSo far,
- X.B
- Xspiff's
- X.R
- Xoperation as an intelligent filter has been described.
- X.B
- XSpiff
- X.R
- Xalso has an interactive mode.
- XWhen operating in interactive mode,
- X.B
- Xspiff
- X.R
- Xplaces corresponding sections of the input files
- Xside by side on user's screen.\**
- X.FS
- XAlthough the current implementation of
- X.B
- Xspiff
- X.R
- Xruns in many environments,
- Xinteractive mode works only under the MGR window manager.[5]
- XOther graphics interfaces will probably be added over time.
- X.FE
- XTokens are compared using a one-to-one ordinal comparison, and any tokens that
- Xare found to be different are highlighted in reverse video.
- XThe user can interactively change the tolerances and
- X.B
- Xspiff
- X.R
- Xwill alter the display
- Xto reflect which real numbers exceed the new tolerances.
- XOther commands allow the user to page through the file and exit.
- X.SH
- XPerformance
- X.PP
- XTwo components of
- X.B
- Xspiff,
- X.R
- Xthe parser and the differencing algorithm,
- Xaccount for most of the execution time. Miller and Myers compare their
- Xalgorithm to the one used in the diff program. To restate their results,
- Xthe Miller/Myers algorithm is faster for files
- Xthat have relatively few differences but much
- Xslower (quadratic time) for files with a great many differences.
- X.PP
- XFor cases where the files do not differ greatly,
- Xparsing the input files takes most of the time (around 80% of the total).\**
- X.FS
- XNo effort has yet been made to make the parser run more quickly.
- XA faster parser could no doubt be written by generating a special state machine.
- X.FE
- XThe performance of the parser is roughly similar to programs that do a similar
- Xlevel of parsing (i.e. programs that must examine each character in the file).
- XFor files where roughly half of the tokens are real numbers,
- X.B
- Xspiff
- X.R
- Xtakes about twice as long to parse the input files
- Xas an
- X.B
- Xawk
- X.R
- Xprogram that counts the number of words in a file:\**
- X.FS
- XFor
- X.B
- Xawk,
- X.R
- Xa word is any string separated by white space.
- X.FE
- X.B
- X.DS
- Xawk '{total += NF}' firstfile secondfile
- X.DE
- X.R
- X.PP
- XThe time that it takes
- X.B
- Xspiff
- X.R
- Xto parse a file is substantially
- Xincreased if scanning is done for comments
- Xand delimited literal strings. The precise effect depends upon the length of
- Xthe delimiters, whether they are restricted to appear at beginning of line, and
- Xthe frequency with which literals and comments appear in the input files.
- XAs an example, adding the 12 literal conventions\**
- X.FS
- XOne literal convention is for C literal strings. The rest enumerate multicharacter
- Xoperators.
- X.FE
- Xand 1 commenting convention
- Xrequired for C code roughly doubles the time required to parse input files.\**
- X.FS
- XSo in total, it takes
- X.B
- Xspiff
- X.R
- Xabout 4 times longer to parse a C program than it takes
- X.B
- Xawk
- X.R
- Xto count the number of words in the same file.
- X.FE
- X.PP
- XA more complete approach to evaluating
- X.B
- Xspiff's
- X.R
- Xperformance must measure the total time that it takes for the user to complete a
- Xdifferencing task. For example, consider one of the
- Xtest suites for the S statistical
- Xanalysis package mentioned at the beginning of this paper.
- XThe output file for each machine is 427 lines long and contains
- X1090 floating point numbers. It takes
- X.B
- Xdiff
- X.R
- Xapproximately 2 seconds on one of our "6 MIPS"\** computers
- X.FS
- XWe will not comment on the usefulness of "MIPS" as a measure
- Xof computing speed. The numbers provided are only intended to
- Xgive the reader some vague idea of how fast these programs run.
- X.FE
- Xto compare the two files and produce
- Xan edit script that is 548 lines long containing 1003 "differences"
- Xin the floating point numbers. It takes the average tester
- X5 minutes to print out the edit script and roughly 2 hours to examine
- Xthe output by hand to determine that the machines are, in fact,
- Xboth giving nearly identical answers. The total time needed is
- X2 hours 5 minutes and 2 seconds.
- X.PP
- XIn contrast, it takes
- X.B
- Xspiff
- X.R
- Xapproximately 6 seconds on one of our "6 MIPS" computers to
- Xproduce an output file that is 4 lines long.\**
- X.FS
- XThe output would be zero length except that the output of the
- X.B
- Xtime
- X.R
- Xcommand is built into the S tests.
- XThe timing information could easily be ignored using
- X.B
- Xspiff's
- X.R
- Xembedded commands. But, as we shall see, it hardly seems worth the trouble.
- X.FE
- XIt takes the average tester 30 seconds to examine
- X.B
- Xspiff's
- X.R
- Xoutput. The total for
- X.B
- Xspiff
- X.R
- Xis 36 seconds. Therefore for this case,
- X.B
- Xspiff
- X.R
- Xwill get the job done roughly 208.88 times faster than
- X.B
- Xdiff.
- X.R
- X.PP
- XIn general, it is misleading to compare
- X.B
- Xspiff's
- X.R
- Xspeed with that of
- X.B
- Xdiff.
- X.R
- XWhile both programs are looking for differences between files,
- Xthey operate on very different types of data (tokens vs. bytes).
- XAn analogous comparison could be made between the speed of an assembler
- Xand the speed of a C compiler. They are both language translators.
- XOne runs much faster than the other.
- XNone the less, most programmers use the slower program
- Xwhenever possible.
- X.SH
- XUsing Spiff For Making Regression Tests Of Software
- X.PP
- XWe envision
- X.B
- Xspiff
- X.R
- Xto be the first of several tools for aiding in the now
- Xarduous task of making regression tests.\**
- X.FS
- XIn software engineering parlance, a "regression test" is the process by
- Xwhich a tester checks to make sure that the new version of a piece of
- Xsoftware still performs the same way as the older versions
- Xon overlapping tasks.
- X.FE
- XGiven
- X.B
- Xspiff's
- X.R
- Xcurrent capabilities, the regression test designer can
- Xtake the output of an older version of software and through
- Xthe use of literal string and commenting conventions,
- Xspecify what parts of the output must remain identical and
- Xwhat sections can change completely. By specifying tolerances, the test
- Xdesigner can take into account how much of a difference in floating
- Xpoint calculations is acceptable.
- X.PP
- XThe test designer is also free to
- Xedit the output from the older version of the software and add embedded
- Xcommands that can instruct
- X.B
- Xspiff
- X.R
- Xto treat various parts of the output
- Xdifferently. The newly edited output can then serve as a template for
- Xthe output of later versions of the software.
- X.PP
- XObviously, editing output by hand is a very low level mechanism for adding
- Xspecification information. It is our intention that
- X.B
- Xspiff
- X.R
- Xwill become
- Xthe last element in a pipeline of programs. Programs (as yet unwritten) located
- Xearlier in the pipeline
- Xcan implement a higher level representation of the specification information.
- XThey read in the old and new input files, add the appropriate embedded commands,
- Xand then pass the results to
- X.B
- Xspiff
- X.R
- Xwhich will do the actual differencing.
- X.SH
- XFuture Work
- X.PP
- XThere are many features that could be added to
- X.B
- Xspiff
- X.R
- X(if there are not
- Xtoo many already). Some of these include:
- X.IP \(bu
- XUsing separate differencing algorithms on separate sections of the file
- Xand/or limiting the scope of an edit sequence (fencing)
- X.IP \(bu
- XProviding a more general mechanism for specifying comments and literals
- X(perhaps allowing specification in terms of regular expressions).
- XAs yet, we have not encountered any important cases where regular expressions
- Xhave been needed. Until such a case is encountered, we will leave regular
- Xexpressions out in the name of simplicity.
- X.IP \(bu
- XAllowing for a more general specification of what lines should look like.
- XAt present, the user can only specify tolerances for numbers as a function
- Xof their ordinal position on a line. The difficulty in expanding the
- Xspecification abilities of
- X.B
- Xspiff
- X.R
- Xis knowing when to stop. In the extreme,
- Xwe might add all of the functionality of a program such as
- X.B
- Xawk.\**
- X.R
- X.FS
- XImagine handling the case such as
- X"apply this tolerance to all numbers that appear
- Xon a line starting with the word `foo' but only if the number is between 1.9
- Xand 3.6 and the word `bar' does not appear on the line".
- X.FE
- XWe hope to keep
- X.B
- Xspiff
- X.R
- Xas simple as possible. Our first efforts in
- Xthis direction will try to implement higher level specification functions
- Xoutside of
- X.B
- Xspiff.
- X.R
- X.SH
- XAcknowledgements
- X.PP
- XFirst and foremost, we thank Stu Feldman for his endless patience, constant encouragement
- Xand numerous good ideas. We also extend thanks to Doug McIlroy for bringing the Miller/Myers
- Xalgorithm to our attention, Nat Howard for a key insight
- Xand for his editorial comments
- Xand Steve Uhler and Mike Bianchi for their editorial comments.
- X.SH
- XReferences
- X.IP [1]
- XHunt,J.W. and M.D. McIlroy.
- X.I
- XAn Algorithm For Differential File Comparisons,
- X.R
- X.B
- XBell Labs Computer Science Technical Report,
- X.R
- XNumber 41, 1975.
- X.IP [2]
- XBecker,R.A. and J.M. Chambers (1984).
- X.B
- XS \- An Interactive Environment For Data Analysis And
- XGraphics.
- X.R
- XBelmont, CA: Wadsworth Inc.
- X.IP [3]
- XWatterson, B. (1987).
- X.B
- XCalvin and Hobbes.
- X.R
- XNew York: Andrews, McMeel & Parker.
- X.IP [4]
- XMiller, W. and E.W. Myers.
- X.I
- XA File Comparison Program,
- X.R
- X.B
- XSoftware \-
- XPractice and Experience
- X.R
- X15, 11, 1025-1040, 1985.
- X.IP [5]
- XUhler, S.A.
- X.I
- XMGR -- A Window Manager For UNIX,
- X.R
- XSun User's Group Meeting. September 1986.
- X.LP
- END_OF_FILE
- if test 25154 -ne `wc -c <'paper.ms'`; then
- echo shar: \"'paper.ms'\" unpacked with wrong size!
- fi
- # end of 'paper.ms'
- fi
- echo shar: End of archive 4 \(of 4\).
- cp /dev/null ark4isdone
- MISSING=""
- for I in 1 2 3 4 ; do
- if test ! -f ark${I}isdone ; then
- MISSING="${MISSING} ${I}"
- fi
- done
- if test "${MISSING}" = "" ; then
- echo You have unpacked all 4 archives.
- rm -f ark[1-9]isdone
- else
- echo You still need to unpack the following archives:
- echo " " ${MISSING}
- fi
- ## End of shell archive.
- exit 0
-